This is a 21-plex experiment using 10-plex TMT with no reference channels per plex. Each plex is a set of biological replicates of mouse livers from the diversity outbred mice (192 mice livers with some repeated samples). Half the mice were fed a high-fat diet, and the other half were fed standard chow. There were equal numbers of male and female mice (one each per cross). Samples were randomly assigned to plexes to try and balance diet and sex. Data were downloaded from PRIDE (PXD002801) and processed with the PAW Pipeline (Wilmarth 2009) and IRS normalized using the average of each plex as a mock reference channel. This is the publication:
Chick, J.M., Munger, S.C., Simecek, P., Huttlin, E.L., Choi, K., Gatti, D.M., Raghupathy, N., Svenson, K.L., Churchill, G.A. and Gygi, S.P., 2016. Defining the consequences of genetic variation on a proteome-wide scale. Nature, 534(7608), pp.500-505.
This QC notebook will address these questions:
The sample key is a bit much for a table here. This is a somewhat balanced study design, so the average intensities for each protein in each plex might work for IRS.
Wilmarth, P.A., Riviere, M.A. and David, L.L., 2009. Techniques for accurate protein identification in shotgun proteomic studies of human, mouse, bovine, and chicken lenses. Journal of ocular biology, diseases, and informatics, 2(4), pp.223-234.
Plubell, D.L., Wilmarth, P.A., Zhao, Y., Fenton, A.M., Minnier, J., Reddy, A.P., Klimek, J., Yang, X., David, L.L. and Pamir, N., 2017. Extended multiplexing of tandem mass tags (TMT) labeling reveals age and high fat diet specific proteome changes in mouse epididymal adipose tissue. Molecular & Cellular Proteomics, 16(5), pp.873-890.
# library imports
library(tidyverse)
library(scales)
library(limma)
library(edgeR)
library(psych)
── Attaching core tidyverse packages ──────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.1 ✔ tibble 3.2.1 ✔ lubridate 1.9.3 ✔ tidyr 1.3.1 ✔ purrr 1.0.2 ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors Attaching package: ‘scales’ The following object is masked from ‘package:purrr’: discard The following object is masked from ‘package:readr’: col_factor Attaching package: ‘psych’ The following objects are masked from ‘package:scales’: alpha, rescale The following objects are masked from ‘package:ggplot2’: %+%, alpha
# ================== TMM normalization from DGEList object =====================
apply_tmm_factors <- function(y, color = NULL, plot = TRUE) {
# computes the tmm normalized data from the DGEList object
# y - DGEList object
# returns a dataframe with normalized intensities
# compute and print "Sample loading" normalization factors
lib_facs <- mean(y$samples$lib.size) / y$samples$lib.size
cat("\nLibrary size factors:\n",
sprintf("%-5s -> %f\n", colnames(y$counts), lib_facs))
# compute and print TMM normalization factors
tmm_facs <- 1/y$samples$norm.factors
cat("\nTrimmed mean of M-values (TMM) factors:\n",
sprintf("%-5s -> %f\n", colnames(y$counts), tmm_facs))
# compute and print the final correction factors
norm_facs <- lib_facs * tmm_facs
cat("\nCombined (lib size and TMM) normalization factors:\n",
sprintf("%-5s -> %f\n", colnames(y$counts), norm_facs))
# compute the normalized data as a new data frame
tmt_tmm <- as.data.frame(sweep(y$counts, 2, norm_facs, FUN = "*"))
colnames(tmt_tmm) <- str_c(colnames(y$counts), "_tmm")
# visualize results and return data frame
if(plot == TRUE) {
boxplot(log10(tmt_tmm), col = color, notch = TRUE, main = "TMM Normalized data")
}
tmt_tmm
}
# ============== CV function ===================================================
CV <- function(df) {
# Computes CVs of data frame rows
# df - data frame,
# returns vector of CVs (%)
ave <- rowMeans(df) # compute averages
sd <- apply(df, 1, sd) # compute standard deviations
cv <- 100 * sd / ave # compute CVs in percent (last thing gets returned)
}
# =========== Boxplot with median label ========================================
labeled_boxplot <- function(df, ylim, title) {
# Makes a box plot with the median value labeled
# df - data frame with data to compute CVs of
# ylim - upper limit for y-axis
# title - plot title
cv = CV(df)
boxplot(cv, ylim = c(0, ylim), notch = TRUE, main = title)
text(x = 0.65, y = boxplot.stats(cv)$stats[3],
labels = round(boxplot.stats(cv)$stats[3], 1))
}
set_plot_dimensions <- function(width_choice, height_choice) {
options(repr.plot.width=width_choice, repr.plot.height=height_choice)
}
The input data for the notebook was prepped in Excel from the IRS script output. The protein table excludes decoys, contaminants, some major blood proteins, and any proteins not seen in all 4 plexes.
# load the IRS-normalized data and check the table
data_irs <- read_tsv("quant_table.txt")
# save gene names for edgeR so we can double check that results line up
accessions <- data_irs$Accession
# see how many rows of data we have
length(accessions)
Rows: 4804 Columns: 232 ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "\t" chr (1): Accession dbl (231): DO_1MH_1, DO_2MH_1, DO_3FH_1, DO_4MH_1, DO_5MS_1, DO_6FS_1, DO_7M... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The publication considered 4 groups by diet and sex. We will group the same way for this notebook.
# define the groups (after IRS)
FH_ <- select(data_irs, contains("FH_"))
cat("FH:", length(names(FH_)), "\n")
FS_ <- select(data_irs, contains("FS_"))
cat("FS:", length(names(FS_)), "\n")
MH_ <- select(data_irs, contains("MH_"))
cat("MH:", length(names(MH_)), "\n")
MS_ <- select(data_irs, contains("MS_"))
cat("MS:", length(names(MS_)), "\n")
FH: 50 FS: 51 MH: 58 MS: 51
We will put the organized data into a new data frame, define some column indexes for each biological group, and set colors for plotting.
# put groups together into a single data frame
tmt_irs <- bind_cols(FH_, FS_, MH_, MS_)
# data.frame(col_num = 1:ncol(tmt_irs), variable = colnames(tmt_irs))
# define which columns go with each group
FH <- 1:50
FS <- 51:101
MH <- 102:159
MS <- 160:210
# set some colors by group
colors_group = c(rep('dark red', 50), rep('pink', 51), rep('dark blue', 58), rep('light blue', 51))
We would expect to see clustering by plex before adjusting intensities with the IRS method. This experiment does not have as balanced samples per plex as the other smaller two experiments did. The plex batch-like effect should be removed by the IRS method and the clustering after IRS should be by biological groups. If the biological group clustering cannot be recovered, the mock reference channel approach may not be appropriate for this data.
# make default plot sizes a little bigger
set_plot_dimensions(9, 9)
# boxplots after IRS and before TMM
boxplot(log10(tmt_irs), col = colors_group, notch = TRUE, main = "After IRS (no TMM)")
# check clustering before IRS
plotMDS(log2(tmt_irs), main = "After IRS (no TMM)", col = colors_group)
We will load the data into edgeR data structures and call the calcNormFactors function to perform library size and the trimmed mean of M-values (TMM) normalization to correct for any sample composition differences. We will double check if the TMM normalization changed the clustering that we had above.
Robinson, M.D. and Oshlack, A., 2010. A scaling normalization method for differential expression analysis of RNA-seq data. Genome biology, 11(3), p.R25.
We need to use the edgeR normalization factors to produce the TMM normalized data that the statistical testing will be working with. EdgeR uses the normalization factors in its statistical modeling but does not output the normalized intensities. We compute the normalized intensities with the apply_tmm_factors function.
# get the biological sample data into a DGEList object
group = c(rep("FH", length(FH)), rep("FS", length(FS)),
rep("MH", length(MH)), rep("MS", length(MS)))
y <- DGEList(counts = tmt_irs, group = group, genes = accessions)
# run TMM normalization (also includes a library size factor)
y <- calcNormFactors(y)
# compute the TMM-normalized intensities
tmt_tmm <- apply_tmm_factors(y, color = colors_group)
# write the TMM-normalized intensities to file
write.table(cbind(accessions,tmt_tmm), "DO_IRS-plus-TMM.txt", sep = "\t",
row.names = FALSE, na = " ")
# check the clustering
plotMDS(y, col = colors_group, main = "After IRS and TMM")
Library size factors: DO_3FH_1 -> 1.006862 DO_12FH_2 -> 0.995895 DO_14FH_2 -> 0.980331 DO_17FH_2 -> 0.985833 DO_26FH_3 -> 0.991650 DO_38FH_5 -> 0.994687 DO_40FH_5 -> 1.000440 DO_42FH_5 -> 1.012346 DO_47FH_6 -> 1.024000 DO_49FH_6 -> 1.004147 DO_55FH_6 -> 1.004504 DO_58FH_7 -> 1.003737 DO_60FH_7 -> 0.998770 DO_61FH_7 -> 1.003351 DO_68FH_8 -> 1.006505 DO_69FH_8 -> 0.992929 DO_75FH_9 -> 1.004702 DO_78FH_9 -> 1.002452 DO_83FH_10 -> 1.001996 DO_87FH_10 -> 1.000672 DO_88FH_10 -> 0.999882 DO_90FH_10 -> 1.000161 DO_94FH_11 -> 1.007224 DO_95FH_11 -> 1.001026 DO_96FH_11 -> 1.009083 DO_100FH_11 -> 1.001989 DO_104FH_12 -> 1.001875 DO_106FH_12 -> 0.997668 DO_108FH_12 -> 1.010022 DO_114FH_13 -> 0.990233 DO_120FH_14 -> 0.978154 DO_121FH_14 -> 1.023491 DO_133FH_15 -> 1.011545 DO_140FH_16 -> 0.996985 DO_141FH_16 -> 1.016052 DO_142FH_16 -> 0.996990 DO_143FH_16 -> 0.990118 DO_147FH_16 -> 1.000110 DO_149FH_16 -> 0.994751 DO_150FH_16 -> 1.004654 DO_154FH_17 -> 0.991000 DO_159FH_17 -> 0.995069 DO_170FH_18 -> 1.008897 DO_183FH_20 -> 0.991863 DO_184FH_20 -> 0.984431 DO_42FH_21 -> 0.984866 DO_145FH_21 -> 0.988509 DO_191FH_21 -> 1.019752 DO_12FH_21 -> 0.993941 DO_194FH_21 -> 1.005758 DO_6FS_1 -> 0.997116 DO_9FS_1 -> 0.990961 DO_10FS_1 -> 0.992491 DO_18FS_2 -> 0.995832 DO_19FS_2 -> 1.000139 DO_21FS_3 -> 0.998206 DO_22FS_3 -> 0.995127 DO_23FS_3 -> 0.992175 DO_32FS_4 -> 0.995651 DO_33FS_4 -> 1.003990 DO_34FS_4 -> 0.997206 DO_37FS_4 -> 1.005611 DO_41FS_5 -> 0.995850 DO_44FS_5 -> 0.999926 DO_45FS_5 -> 1.004522 DO_51FS_6 -> 0.996750 DO_53FS_6 -> 0.996751 DO_54FS_6 -> 0.995179 DO_59FS_7 -> 0.992742 DO_62FS_7 -> 1.003516 DO_64FS_7 -> 1.010183 DO_66FS_8 -> 0.997526 DO_72FS_8 -> 0.993821 DO_74FS_9 -> 0.999856 DO_77FS_9 -> 0.994943 DO_79FS_9 -> 1.002702 DO_85FS_10 -> 0.997301 DO_93FS_11 -> 1.005725 DO_99FS_11 -> 1.003018 DO_101FS_12 -> 0.994057 DO_105FS_12 -> 0.999136 DO_112FS_13 -> 0.991125 DO_116FS_13 -> 0.992739 DO_122FS_14 -> 1.004199 DO_128FS_14 -> 1.008352 DO_130FS_15 -> 1.004014 DO_131FS_15 -> 1.012238 DO_132FS_15 -> 0.993844 DO_146FS_16 -> 0.998074 DO_151FS_17 -> 1.001838 DO_152FS_17 -> 1.000045 DO_153FS_17 -> 1.004869 DO_161FS_18 -> 0.983602 DO_163FS_18 -> 1.039877 DO_167FS_18 -> 0.997462 DO_174FS_19 -> 0.985897 DO_179FS_19 -> 1.000652 DO_180FS_19 -> 0.999922 DO_188FS_20 -> 0.987578 DO_62FS_21 -> 1.011947 DO_193FS_21 -> 0.994444 DO_1MH_1 -> 0.993601 DO_2MH_1 -> 1.007922 DO_4MH_1 -> 1.016413 DO_7MH_1 -> 1.005489 DO_16MH_2 -> 1.009300 DO_4MH_2 -> 1.000929 DO_24MH_3 -> 1.004842 DO_25MH_3 -> 1.025246 DO_27MH_3 -> 1.010156 DO_4MH_3 -> 0.983766 DO_31MH_4 -> 1.009047 DO_35MH_4 -> 1.002238 DO_4MH_4 -> 0.993296 DO_39MH_5 -> 0.988816 DO_43MH_5 -> 1.014687 DO_4MH_5 -> 0.976842 DO_50MH_6 -> 1.016051 DO_4MH_6 -> 0.985780 DO_57MH_7 -> 0.996428 DO_63MH_7 -> 0.995516 DO_4MH_7 -> 1.001048 DO_67MH_8 -> 0.999535 DO_71MH_8 -> 0.995276 DO_4MH_8 -> 0.989023 DO_82MH_9 -> 1.000436 DO_4MH_9 -> 0.994999 DO_84MH_10 -> 1.002614 DO_86MH_10 -> 0.999922 DO_89MH_10 -> 1.001183 DO_4MH_10 -> 1.000843 DO_97MH_11 -> 0.990699 DO_4MH_11 -> 0.988826 DO_103MH_12 -> 0.998726 DO_107MH_12 -> 1.002153 DO_4MH_12 -> 0.994317 DO_110MH_13 -> 1.000987 DO_111MH_13 -> 1.000765 DO_117MH_13 -> 1.003283 DO_119MH_13 -> 1.008566 DO_125MH_14 -> 1.002970 DO_129MH_14 -> 0.991356 DO_134MH_15 -> 0.988269 DO_138MH_15 -> 1.004762 DO_148MH_16 -> 1.009571 DO_162MH_18 -> 0.977130 DO_165MH_18 -> 0.998380 DO_168MH_18 -> 1.007174 DO_169MH_18 -> 1.028101 DO_172MH_19 -> 1.015569 DO_175MH_19 -> 0.996073 DO_178MH_19 -> 0.989610 DO_181MH_20 -> 0.994278 DO_182MH_20 -> 1.007589 DO_185MH_20 -> 1.007334 DO_186MH_20 -> 1.005991 DO_189MH_20 -> 1.010115 DO_4MH_21 -> 1.007476 DO_192MH_21 -> 1.011334 DO_5MS_1 -> 0.993441 DO_8MS_1 -> 0.999286 DO_11MS_2 -> 1.001607 DO_13MS_2 -> 1.000930 DO_15MS_2 -> 1.017289 DO_20MS_3 -> 0.999724 DO_28MS_3 -> 0.997886 DO_29MS_4 -> 0.997317 DO_30MS_4 -> 1.002272 DO_36MS_4 -> 0.996318 DO_46MS_5 -> 1.002634 DO_48MS_6 -> 0.993973 DO_52MS_6 -> 0.990713 DO_56MS_7 -> 0.994381 DO_65MS_8 -> 1.006982 DO_70MS_8 -> 1.005026 DO_73MS_8 -> 1.015562 DO_76MS_9 -> 1.001115 DO_80MS_9 -> 1.004105 DO_81MS_9 -> 0.995988 DO_91MS_10 -> 0.997776 DO_92MS_11 -> 0.992305 DO_98MS_11 -> 0.995183 DO_102MS_12 -> 0.994520 DO_109MS_12 -> 1.005589 DO_113MS_13 -> 1.011214 DO_115MS_13 -> 1.003191 DO_118MS_13 -> 1.008615 DO_123MS_14 -> 0.998825 DO_124MS_14 -> 1.001904 DO_126MS_14 -> 0.996437 DO_127MS_14 -> 0.994663 DO_135MS_15 -> 0.990141 DO_136MS_15 -> 1.000797 DO_137MS_15 -> 0.978433 DO_139MS_15 -> 1.007136 DO_144MS_16 -> 0.996668 DO_155MS_17 -> 0.997987 DO_156MS_17 -> 1.004332 DO_157MS_17 -> 1.007964 DO_158MS_17 -> 0.996084 DO_160MS_17 -> 1.007634 DO_164MS_18 -> 0.977132 DO_166MS_18 -> 0.985102 DO_171MS_19 -> 1.012180 DO_173MS_19 -> 1.004120 DO_176MS_19 -> 1.001449 DO_177MS_19 -> 1.000497 DO_187MS_20 -> 0.995924 DO_190MS_20 -> 1.014378 DO_52MS_21 -> 0.987854 Trimmed mean of M-values (TMM) factors: DO_3FH_1 -> 0.984637 DO_12FH_2 -> 0.970631 DO_14FH_2 -> 0.967783 DO_17FH_2 -> 1.027854 DO_26FH_3 -> 1.036544 DO_38FH_5 -> 1.014480 DO_40FH_5 -> 1.034588 DO_42FH_5 -> 0.966728 DO_47FH_6 -> 0.908890 DO_49FH_6 -> 1.004842 DO_55FH_6 -> 1.001417 DO_58FH_7 -> 1.003026 DO_60FH_7 -> 0.964322 DO_61FH_7 -> 0.979063 DO_68FH_8 -> 1.064563 DO_69FH_8 -> 0.984809 DO_75FH_9 -> 1.023070 DO_78FH_9 -> 0.977882 DO_83FH_10 -> 0.965629 DO_87FH_10 -> 0.991698 DO_88FH_10 -> 1.012273 DO_90FH_10 -> 1.018005 DO_94FH_11 -> 0.979385 DO_95FH_11 -> 0.979260 DO_96FH_11 -> 0.977134 DO_100FH_11 -> 0.979838 DO_104FH_12 -> 0.946883 DO_106FH_12 -> 0.956345 DO_108FH_12 -> 1.013455 DO_114FH_13 -> 0.994230 DO_120FH_14 -> 1.006590 DO_121FH_14 -> 0.985023 DO_133FH_15 -> 0.956319 DO_140FH_16 -> 1.014897 DO_141FH_16 -> 0.928495 DO_142FH_16 -> 1.004525 DO_143FH_16 -> 1.002125 DO_147FH_16 -> 0.986917 DO_149FH_16 -> 0.999223 DO_150FH_16 -> 1.014697 DO_154FH_17 -> 1.021475 DO_159FH_17 -> 1.014649 DO_170FH_18 -> 0.953876 DO_183FH_20 -> 1.034699 DO_184FH_20 -> 1.009735 DO_42FH_21 -> 0.981690 DO_145FH_21 -> 1.029946 DO_191FH_21 -> 1.005099 DO_12FH_21 -> 0.992890 DO_194FH_21 -> 0.940080 DO_6FS_1 -> 1.033892 DO_9FS_1 -> 0.963997 DO_10FS_1 -> 1.009481 DO_18FS_2 -> 0.989701 DO_19FS_2 -> 1.036582 DO_21FS_3 -> 1.022705 DO_22FS_3 -> 1.063323 DO_23FS_3 -> 1.063819 DO_32FS_4 -> 0.995780 DO_33FS_4 -> 0.991048 DO_34FS_4 -> 1.041301 DO_37FS_4 -> 0.972501 DO_41FS_5 -> 1.031427 DO_44FS_5 -> 0.985643 DO_45FS_5 -> 1.000516 DO_51FS_6 -> 1.010010 DO_53FS_6 -> 1.013910 DO_54FS_6 -> 1.026980 DO_59FS_7 -> 0.967873 DO_62FS_7 -> 1.067026 DO_64FS_7 -> 1.076214 DO_66FS_8 -> 1.029944 DO_72FS_8 -> 1.017481 DO_74FS_9 -> 0.981226 DO_77FS_9 -> 1.019004 DO_79FS_9 -> 0.965211 DO_85FS_10 -> 1.003304 DO_93FS_11 -> 0.973536 DO_99FS_11 -> 1.025248 DO_101FS_12 -> 1.020993 DO_105FS_12 -> 1.011515 DO_112FS_13 -> 1.023311 DO_116FS_13 -> 1.013065 DO_122FS_14 -> 1.014019 DO_128FS_14 -> 0.954047 DO_130FS_15 -> 1.038281 DO_131FS_15 -> 1.045136 DO_132FS_15 -> 0.958723 DO_146FS_16 -> 1.051391 DO_151FS_17 -> 1.028580 DO_152FS_17 -> 1.002740 DO_153FS_17 -> 0.978550 DO_161FS_18 -> 1.079662 DO_163FS_18 -> 0.953603 DO_167FS_18 -> 0.981736 DO_174FS_19 -> 1.009430 DO_179FS_19 -> 0.969062 DO_180FS_19 -> 0.993433 DO_188FS_20 -> 1.064875 DO_62FS_21 -> 1.050789 DO_193FS_21 -> 0.965976 DO_1MH_1 -> 0.998195 DO_2MH_1 -> 0.999468 DO_4MH_1 -> 1.037362 DO_7MH_1 -> 0.981022 DO_16MH_2 -> 0.959746 DO_4MH_2 -> 1.044157 DO_24MH_3 -> 0.942848 DO_25MH_3 -> 0.881993 DO_27MH_3 -> 0.960091 DO_4MH_3 -> 1.045901 DO_31MH_4 -> 0.931341 DO_35MH_4 -> 1.024665 DO_4MH_4 -> 1.007464 DO_39MH_5 -> 1.020780 DO_43MH_5 -> 0.907584 DO_4MH_5 -> 1.102616 DO_50MH_6 -> 0.936426 DO_4MH_6 -> 1.080914 DO_57MH_7 -> 0.986396 DO_63MH_7 -> 0.969432 DO_4MH_7 -> 1.093863 DO_67MH_8 -> 0.933502 DO_71MH_8 -> 0.925085 DO_4MH_8 -> 1.108611 DO_82MH_9 -> 1.007587 DO_4MH_9 -> 0.981486 DO_84MH_10 -> 1.021023 DO_86MH_10 -> 1.009195 DO_89MH_10 -> 0.988643 DO_4MH_10 -> 1.024197 DO_97MH_11 -> 0.982032 DO_4MH_11 -> 1.057607 DO_103MH_12 -> 0.946842 DO_107MH_12 -> 1.037748 DO_4MH_12 -> 1.043064 DO_110MH_13 -> 0.999426 DO_111MH_13 -> 0.995160 DO_117MH_13 -> 0.969681 DO_119MH_13 -> 0.953735 DO_125MH_14 -> 0.986235 DO_129MH_14 -> 0.997761 DO_134MH_15 -> 1.074754 DO_138MH_15 -> 1.002864 DO_148MH_16 -> 1.005994 DO_162MH_18 -> 1.082478 DO_165MH_18 -> 1.014983 DO_168MH_18 -> 1.003529 DO_169MH_18 -> 0.871072 DO_172MH_19 -> 1.041104 DO_175MH_19 -> 1.015518 DO_178MH_19 -> 0.948769 DO_181MH_20 -> 0.971105 DO_182MH_20 -> 0.959945 DO_185MH_20 -> 0.960447 DO_186MH_20 -> 1.005621 DO_189MH_20 -> 0.943917 DO_4MH_21 -> 1.020574 DO_192MH_21 -> 1.009096 DO_5MS_1 -> 1.033740 DO_8MS_1 -> 0.948307 DO_11MS_2 -> 1.012971 DO_13MS_2 -> 1.038838 DO_15MS_2 -> 0.982412 DO_20MS_3 -> 0.997720 DO_28MS_3 -> 0.978608 DO_29MS_4 -> 1.024003 DO_30MS_4 -> 0.972469 DO_36MS_4 -> 1.013167 DO_46MS_5 -> 0.972493 DO_48MS_6 -> 1.024515 DO_52MS_6 -> 1.018990 DO_56MS_7 -> 0.970391 DO_65MS_8 -> 1.020042 DO_70MS_8 -> 0.960172 DO_73MS_8 -> 1.056735 DO_76MS_9 -> 0.989547 DO_80MS_9 -> 0.984221 DO_81MS_9 -> 1.036821 DO_91MS_10 -> 0.956703 DO_92MS_11 -> 1.048526 DO_98MS_11 -> 0.997970 DO_102MS_12 -> 1.024650 DO_109MS_12 -> 1.035777 DO_113MS_13 -> 1.020010 DO_115MS_13 -> 0.970438 DO_118MS_13 -> 0.986983 DO_123MS_14 -> 0.994689 DO_124MS_14 -> 0.988876 DO_126MS_14 -> 1.029977 DO_127MS_14 -> 1.036642 DO_135MS_15 -> 0.966275 DO_136MS_15 -> 0.999988 DO_137MS_15 -> 1.012264 DO_139MS_15 -> 0.946462 DO_144MS_16 -> 0.972595 DO_155MS_17 -> 0.948379 DO_156MS_17 -> 0.995356 DO_157MS_17 -> 1.010819 DO_158MS_17 -> 1.013181 DO_160MS_17 -> 0.976844 DO_164MS_18 -> 1.074727 DO_166MS_18 -> 1.012637 DO_171MS_19 -> 1.006474 DO_173MS_19 -> 1.003007 DO_176MS_19 -> 0.994819 DO_177MS_19 -> 1.042915 DO_187MS_20 -> 1.041614 DO_190MS_20 -> 0.974922 DO_52MS_21 -> 1.020963 Combined (lib size and TMM) normalization factors: DO_3FH_1 -> 0.991393 DO_12FH_2 -> 0.966647 DO_14FH_2 -> 0.948747 DO_17FH_2 -> 1.013293 DO_26FH_3 -> 1.027889 DO_38FH_5 -> 1.009089 DO_40FH_5 -> 1.035043 DO_42FH_5 -> 0.978664 DO_47FH_6 -> 0.930703 DO_49FH_6 -> 1.009009 DO_55FH_6 -> 1.005927 DO_58FH_7 -> 1.006774 DO_60FH_7 -> 0.963136 DO_61FH_7 -> 0.982343 DO_68FH_8 -> 1.071488 DO_69FH_8 -> 0.977845 DO_75FH_9 -> 1.027881 DO_78FH_9 -> 0.980280 DO_83FH_10 -> 0.967557 DO_87FH_10 -> 0.992365 DO_88FH_10 -> 1.012153 DO_90FH_10 -> 1.018170 DO_94FH_11 -> 0.986459 DO_95FH_11 -> 0.980264 DO_96FH_11 -> 0.986008 DO_100FH_11 -> 0.981787 DO_104FH_12 -> 0.948659 DO_106FH_12 -> 0.954115 DO_108FH_12 -> 1.023612 DO_114FH_13 -> 0.984519 DO_120FH_14 -> 0.984600 DO_121FH_14 -> 1.008163 DO_133FH_15 -> 0.967360 DO_140FH_16 -> 1.011838 DO_141FH_16 -> 0.943400 DO_142FH_16 -> 1.001501 DO_143FH_16 -> 0.992222 DO_147FH_16 -> 0.987025 DO_149FH_16 -> 0.993978 DO_150FH_16 -> 1.019420 DO_154FH_17 -> 1.012282 DO_159FH_17 -> 1.009646 DO_170FH_18 -> 0.962362 DO_183FH_20 -> 1.026280 DO_184FH_20 -> 0.994014 DO_42FH_21 -> 0.966833 DO_145FH_21 -> 1.018111 DO_191FH_21 -> 1.024951 DO_12FH_21 -> 0.986874 DO_194FH_21 -> 0.945493 DO_6FS_1 -> 1.030910 DO_9FS_1 -> 0.955283 DO_10FS_1 -> 1.001901 DO_18FS_2 -> 0.985576 DO_19FS_2 -> 1.036726 DO_21FS_3 -> 1.020871 DO_22FS_3 -> 1.058142 DO_23FS_3 -> 1.055496 DO_32FS_4 -> 0.991449 DO_33FS_4 -> 0.995002 DO_34FS_4 -> 1.038391 DO_37FS_4 -> 0.977957 DO_41FS_5 -> 1.027147 DO_44FS_5 -> 0.985570 DO_45FS_5 -> 1.005040 DO_51FS_6 -> 1.006727 DO_53FS_6 -> 1.010616 DO_54FS_6 -> 1.022028 DO_59FS_7 -> 0.960848 DO_62FS_7 -> 1.070777 DO_64FS_7 -> 1.087173 DO_66FS_8 -> 1.027397 DO_72FS_8 -> 1.011194 DO_74FS_9 -> 0.981085 DO_77FS_9 -> 1.013850 DO_79FS_9 -> 0.967819 DO_85FS_10 -> 1.000596 DO_93FS_11 -> 0.979109 DO_99FS_11 -> 1.028342 DO_101FS_12 -> 1.014925 DO_105FS_12 -> 1.010641 DO_112FS_13 -> 1.014229 DO_116FS_13 -> 1.005710 DO_122FS_14 -> 1.018276 DO_128FS_14 -> 0.962016 DO_130FS_15 -> 1.042448 DO_131FS_15 -> 1.057927 DO_132FS_15 -> 0.952821 DO_146FS_16 -> 1.049366 DO_151FS_17 -> 1.030471 DO_152FS_17 -> 1.002784 DO_153FS_17 -> 0.983314 DO_161FS_18 -> 1.061958 DO_163FS_18 -> 0.991630 DO_167FS_18 -> 0.979244 DO_174FS_19 -> 0.995194 DO_179FS_19 -> 0.969693 DO_180FS_19 -> 0.993356 DO_188FS_20 -> 1.051647 DO_62FS_21 -> 1.063342 DO_193FS_21 -> 0.960608 DO_1MH_1 -> 0.991808 DO_2MH_1 -> 1.007386 DO_4MH_1 -> 1.054389 DO_7MH_1 -> 0.986407 DO_16MH_2 -> 0.968672 DO_4MH_2 -> 1.045127 DO_24MH_3 -> 0.947413 DO_25MH_3 -> 0.904260 DO_27MH_3 -> 0.969842 DO_4MH_3 -> 1.028921 DO_31MH_4 -> 0.939767 DO_35MH_4 -> 1.026958 DO_4MH_4 -> 1.000710 DO_39MH_5 -> 1.009363 DO_43MH_5 -> 0.920914 DO_4MH_5 -> 1.077082 DO_50MH_6 -> 0.951457 DO_4MH_6 -> 1.065543 DO_57MH_7 -> 0.982873 DO_63MH_7 -> 0.965085 DO_4MH_7 -> 1.095009 DO_67MH_8 -> 0.933068 DO_71MH_8 -> 0.920715 DO_4MH_8 -> 1.096442 DO_82MH_9 -> 1.008026 DO_4MH_9 -> 0.976577 DO_84MH_10 -> 1.023692 DO_86MH_10 -> 1.009116 DO_89MH_10 -> 0.989812 DO_4MH_10 -> 1.025060 DO_97MH_11 -> 0.972899 DO_4MH_11 -> 1.045789 DO_103MH_12 -> 0.945636 DO_107MH_12 -> 1.039983 DO_4MH_12 -> 1.037135 DO_110MH_13 -> 1.000413 DO_111MH_13 -> 0.995922 DO_117MH_13 -> 0.972864 DO_119MH_13 -> 0.961905 DO_125MH_14 -> 0.989164 DO_129MH_14 -> 0.989137 DO_134MH_15 -> 1.062146 DO_138MH_15 -> 1.007639 DO_148MH_16 -> 1.015622 DO_162MH_18 -> 1.057722 DO_165MH_18 -> 1.013339 DO_168MH_18 -> 1.010728 DO_169MH_18 -> 0.895549 DO_172MH_19 -> 1.057313 DO_175MH_19 -> 1.011529 DO_178MH_19 -> 0.938912 DO_181MH_20 -> 0.965548 DO_182MH_20 -> 0.967230 DO_185MH_20 -> 0.967491 DO_186MH_20 -> 1.011646 DO_189MH_20 -> 0.953464 DO_4MH_21 -> 1.028203 DO_192MH_21 -> 1.020534 DO_5MS_1 -> 1.026961 DO_8MS_1 -> 0.947631 DO_11MS_2 -> 1.014599 DO_13MS_2 -> 1.039804 DO_15MS_2 -> 0.999396 DO_20MS_3 -> 0.997445 DO_28MS_3 -> 0.976540 DO_29MS_4 -> 1.021256 DO_30MS_4 -> 0.974678 DO_36MS_4 -> 1.009436 DO_46MS_5 -> 0.975054 DO_48MS_6 -> 1.018340 DO_52MS_6 -> 1.009527 DO_56MS_7 -> 0.964938 DO_65MS_8 -> 1.027164 DO_70MS_8 -> 0.964997 DO_73MS_8 -> 1.073179 DO_76MS_9 -> 0.990650 DO_80MS_9 -> 0.988261 DO_81MS_9 -> 1.032662 DO_91MS_10 -> 0.954575 DO_92MS_11 -> 1.040457 DO_98MS_11 -> 0.993163 DO_102MS_12 -> 1.019035 DO_109MS_12 -> 1.041566 DO_113MS_13 -> 1.031448 DO_115MS_13 -> 0.973535 DO_118MS_13 -> 0.995486 DO_123MS_14 -> 0.993520 DO_124MS_14 -> 0.990760 DO_126MS_14 -> 1.026307 DO_127MS_14 -> 1.031109 DO_135MS_15 -> 0.956749 DO_136MS_15 -> 1.000785 DO_137MS_15 -> 0.990433 DO_139MS_15 -> 0.953216 DO_144MS_16 -> 0.969354 DO_155MS_17 -> 0.946469 DO_156MS_17 -> 0.999667 DO_157MS_17 -> 1.018869 DO_158MS_17 -> 1.009214 DO_160MS_17 -> 0.984301 DO_164MS_18 -> 1.050150 DO_166MS_18 -> 0.997550 DO_171MS_19 -> 1.018733 DO_173MS_19 -> 1.007139 DO_176MS_19 -> 0.996260 DO_177MS_19 -> 1.043433 DO_187MS_20 -> 1.037368 DO_190MS_20 -> 0.988939 DO_52MS_21 -> 1.008563
Boxplots and normalization factors suggest okay sample-to-sample similarity after IRS and TMM adjustments. The clustering looks biological (not by plex). We saw strong sex effects in the CC experiment, and we see that here with male samples on the right and female samples on the left. The diet choice shows top-to-bottom separation in both sexes, although the relative degree of variation is a lot larger for the x-axis than for the y-axis.
The distributions of Coefficients of Variation (CVs) are another way to get an idea of how individual proteins are behaving. This is an effective way to assess proper normalization in these experiments. We will compute CV distributions for each of the biological conditions.
# put CVs in data frames to simplify plots and summaries
# we will skip the n=1 group (TOFPS)
cv_tmm <- data.frame(FH = CV(tmt_tmm[FH]), FS = CV(tmt_tmm[FS]),
MH = CV(tmt_tmm[MH]), MS = CV(tmt_tmm[MS]))
medians <- apply(cv_tmm, 2, FUN = median)
print("Final median CVs by condition (%)")
round(medians, 2)
[1] "Final median CVs by condition (%)"
The average median CV in this experiment is higher than we had for the CC experiment or the founder experiment. This probably reflects that the plexes are not as well balanced as in the two smaller experiments. It might also reflect more biological variability in the larger cohort of crosses.
We will look at the final IRS plus TMM normalized data here.
# see what the CV distibutions look like
# need long form for ggplot
long_cv_tmm <- gather(cv_tmm, key = "group", value = "cv")
# traditional boxplots
ggplot(long_cv_tmm, aes(x = group, y = cv, fill = group)) +
geom_boxplot(notch = TRUE) +
ggtitle("Final CV distributions")
# density plots
ggplot(long_cv_tmm, aes(x = cv, color = group)) +
geom_density() +
coord_cartesian(xlim = c(0, 100)) +
ggtitle("Final CV distributions")
Even though the median CVs are a little higher, the shapes of the four CV distributions are very similar, and most CVs are relatively small (under 30%).
We will need to draw some random subsets of samples in each group to have manageable scatter plot grids (5 or 6 sample is about as many as we can do).
# multi-panel scatter plot grids, final data (random draws of 5 samples per group)
pairs.panels(log10(tmt_tmm[sample(FH, 5)]), lm = TRUE, main = "Female, high-fat diet - final")
pairs.panels(log10(tmt_tmm[sample(FS, 5)]), lm = TRUE, main = "Female, standard diet - final")
pairs.panels(log10(tmt_tmm[sample(MH, 5)]), lm = TRUE, main = "Male, high-fat diet - final")
pairs.panels(log10(tmt_tmm[sample(MS, 5)]), lm = TRUE, main = "Male, standard diet - final")
The scatter plots in each group show a little more scatter than we saw in the two smaller experiments. This might reflect the increased biological diversity in this experiment.
We can get a little heads up on the statistical testing by comparing group averages to each other.
NOTE: Correlation coefficients are not very informative in the scatter plot grids. The liver samples from the mice co-vary with protein relative abundance, so the coefficients are always close to 1.0. The (subjective) visual impression of degree of scatter is a more useful metric.
# save the dataframe with the averages by developmental age
ave_tmm <- cbind(FH_ave = rowMeans(tmt_tmm[FH]),
FS_ave = rowMeans(tmt_tmm[FS]),
MH_ave = rowMeans(tmt_tmm[MH]),
MS_ave = rowMeans(tmt_tmm[MS]))
# make a scatter plot grid for the group averages
pairs.panels(log10(ave_tmm), lm = TRUE, main = "Group Averages - final")
The scatter plots of group averages indicate that the sex differences are quite a bit larger than the diet effect. The upper left and lower right scatter plots are tight to the diagonal trend line indicating a relatively small diet effect. Scatter plots between sexes, independent of diet, show many more proteins having larger abundance changes. Quality control checks seem okay. We can move on to any statistical testing.
sessionInfo()
R version 4.4.0 (2024-04-24) Platform: x86_64-apple-darwin20 Running under: macOS 15.7.2 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/Los_Angeles tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] psych_2.4.3 edgeR_4.2.0 limma_3.60.2 scales_1.3.0 [5] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 [9] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 [13] ggplot2_3.5.1 tidyverse_2.0.0 loaded via a namespace (and not attached): [1] utf8_1.2.4 generics_0.1.3 stringi_1.8.4 lattice_0.22-6 [5] hms_1.1.3 digest_0.6.35 magrittr_2.0.3 evaluate_0.23 [9] grid_4.4.0 timechange_0.3.0 pbdZMQ_0.3-11 fastmap_1.2.0 [13] jsonlite_1.8.8 fansi_1.0.6 mnormt_2.1.1 cli_3.6.2 [17] rlang_1.1.3 crayon_1.5.2 bit64_4.0.5 munsell_0.5.1 [21] base64enc_0.1-3 withr_3.0.0 repr_1.1.7 tools_4.4.0 [25] parallel_4.4.0 tzdb_0.4.0 uuid_1.2-0 colorspace_2.1-0 [29] locfit_1.5-9.9 IRdisplay_1.1 vctrs_0.6.5 R6_2.5.1 [33] lifecycle_1.0.4 bit_4.0.5 vroom_1.6.5 pkgconfig_2.0.3 [37] pillar_1.9.0 gtable_0.3.5 glue_1.7.0 Rcpp_1.0.12 [41] statmod_1.5.0 tidyselect_1.2.1 IRkernel_1.3.2 farver_2.1.2 [45] htmltools_0.5.8.1 nlme_3.1-164 labeling_0.4.3 compiler_4.4.0